Image characterization

ABSTRACT

A method of analysing a sequence of images, for example a sequence of images from a video signal, in which the amount the image changes between two images in a sequence is used to classify the sequence as being either a cartoon or a non cartoon sequence.

This application is the US national phase of international applicationPCT/GB01/04962 filed 8 Nov. 2001 which designated the U.S.

BACKGROUND

1. Technical Field

This invention relates to a method of and apparatus for characterising asequence of images. One aspect of the invention relates to a method ofand apparatus for determining whether a sequence images, such as asequence of frames of a video signal, represents an animated cartoon.

2. Related Art

With the growing availability of online data, provision of hundreds oreven thousands of data channels by an information provider causesproblems of content management and verification, as manual checking ofevery piece of data becomes infeasible. For image data, there isincreasing interest in techniques for automated image interpretation andclassification. Automated image interpretation and classification canprovide indexing, cataloging and searching of still image or movingimage databases. Image interpretation and classification can be doneeither by the service provider or by the service receiver.

One method of image classification is to analyse the content of eachindividual frame. Another method is to comparing images from a sequenceof frames with each other.

BRIEF SUMMARY

According to the present invention there is provided a method forcharacterising a sequence of images represented by a plurality of pixelswhose intensity and/or colour can change with time, each pixel having apixel value indicative of the intensity and/or colour of the pixel, themethod comprising the steps of:

(i) determining the temporal rate of change of the pixel value for eachpixel of a group of pixels;

(ii) combining the determined rates of change for each of the pixels soas to provide a combined rate of change value for the sequence ofimages, the combined rate of change having a plurality of temporalfrequency components associated therewith;

(iii) determining the size of at least some of the temporal frequencycomponents associated with the combined rate of change; and,

(iv) characterising the sequence of images in dependence upon the sizesof the frequency components.

The term “intensity” will be understood to include luminosity, strengthor other value indicative of the brightness with which a pixel whendisplayed can be perceived by the human eye.

It has been appreciated that image sequences displaying different typesof subject matter have different temporal frequency components. Forexample, the amount of movement in a sequence of images has been foundto affect the extent to which different temporal components are present.Thus, sequence of images showing a stationary person talking willnormally have frequency components whose amplitudes relative to oneanother are very different to those of a sequence of images showing aviolent scene in a film. Hence by obtaining at least some of thetemporal frequency components in a sequence of images, it is possible tocharacterise the sequence of images.

In particular, it has been found that the frequency components of asequence of images in an animated cartoon are very different from whichare not from an animated cartoon. Therefore, in one embodiment, thesequence of images is characterised as being an animated cartoon or notbeing an animated cartoon. This will facilitate parents to stop childrenfrom downloading videos from the Internet or from watching TV programsother than cartoons. However, the method could be used to classify orotherwise characterise a sequence of images. For example, theclassification of pornographic images or recognition of particularpeople could prove useful.

It will be appreciated that the absolute size of the frequencycomponents making up the combined rate of change need not be determined,and that in many situations the only the sizes of the frequencycomponents relative to one another is important. The relative sizes ofthe different frequency components can then represent a temporalspectrum. Normally, the size of a frequency component will be measuredby its amplitude or magnitude.

The rate of change of a pixel value will preferably be the firstderivative of the pixel value with respect to time, but the rate ofchange may be the second or yet higher order derivative of the pixelvalue with respect to time.

The rates of change may be combined by simply taking the sum of therates of change, or by taking a weighted average of the rates of changefor different pixels.

The group of pixels may be distributed in a spaced apart fashion over anarea. Alternatively, the group of pixels may be formed by pixels whichneighbour one another.

Preferably the method further comprises the step of partitioning theimage into a plurality of subimages; and in which the combining stepcomprises the sub steps of combining the determined rate of change forthe plurality of pixels in a subimage to provide a subimage rate ofchange; and subsequently combining said subimage rates of change toprovide said value.

Preferably the rate of change of the value for each pixel is determinedby calculating the difference between the value for a pixel for oneimage and the value of a corresponding pixel for a previous image.

Preferably the combining step includes the sub step of determining theproportion of pixels in an image of the sequence which have a valuewhich is substantially different from the value of the correspondingpixel in a previous image of the sequence.

The spectrum of the combined rate of change of a plurality of pixels maybe determined using a Fourier transform.

A discrete cosine transform may be used to provide a plurality of valueswhich characterise the spectrum.

According to another aspect of the invention there is also providedapparatus for characterising a sequence of images represented by aplurality of pixels, each pixel having a pixel value indicative of itsintensity and/or colour, the apparatus comprising: means for determiningthe temporal rate of change of the pixel value for each pixel of a groupof pixels; means for combining the determined rates of change for eachof the pixels so as to provide a combined rate of change value for thesequence of images, the combined rate of change having a plurality oftemporal frequency components associated therewith; means fordetermining the size of at least some of the temporal frequencycomponents associated with the combined rate of change; and, means forcharacterising the sequence of images in dependence upon the sizes ofthe frequency components.

According to yet another aspect of the invention, there is provided amethod for classifying whether a sequence of images represents acartoon, in which each image comprises a plurality of pixels, each pixelhaving a value representative of the intensity and/or colour of thepixel, the method comprising the steps of:

-   -   for a plurality of pixels in an image, determining the rate of        change of the value for each pixel for a plurality of images of        a sequence of images;    -   combining the determined rate of change of the plurality of        pixels to provide a combined rate of change value for said        plurality of images;    -   determining the sizes of the frequency components of the        combined rate of change value; and    -   classifying the sequence of images in dependence upon said sizes        of the frequency components.

According to a further aspect of the invention, there is provided anapparatus for determining whether a signal representing a sequence ofimages represents an animated cartoon, the apparatus comprising

-   -   means (71) for determining the rate of change of the value for a        pixel in an image, the value being representative of the        intensity and/or colour of the pixel;    -   means (81) for combining the determined rate of change to        provide a combined rate of change;    -   means (77) for determining the sizes of the frequency components        of the combined rate of change; and    -   means (80) for classifying the signal in dependence upon said        sizes of the frequency components.

Preferably the apparatus further comprises a segmenter (70) forpartitioning an image of the sequence into a plurality of subimages.

Preferable the combiner (81) comprises means (72) for determining theproportion of pixels in an image which are substantially different fromthe value of the corresponding pixel in a previous image of thesequence.

Preferably the apparatus further comprises a discrete cosine transformer(78) for characterising the spectrum.

The invention also includes a data carrier loadable into a computer andcarrying instructions for causing the computer to carry out the methodof the invention and for enabling a computer to provide the apparatusaccording to the invention

BRIEF DESCRIPTION OF THE DRAWINGS

An embodiment of the invention will now be described, by way of exampleonly, with reference to the accompanying drawings in which

FIG. 1 is a schematic representation of a computer loaded with softwareembodying the present invention;

FIG. 2 is a flow chart showing the method steps performed in oneembodiment of the invention by the software illustrated in FIG. 1;

FIG. 3 shows a graph of the percentage of changed pixels both before andafter scene change filtering;

FIG. 4 illustrates how a Discrete Fourier Transform is applied tosequential windows of samples;

FIG. 5 illustrates the difference between spectrums for a cartoonsequence of images and a non cartoon sequence of images;

FIG. 6 demonstrates how the number of Discrete Cosine Transformcoefficients affects the classification error rate; and

FIG. 7 is a functional block diagram of the program elements thatcomprise the software indicated in FIG. 1.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

FIG. 1 illustrates a conventional computer 101, such as a PersonalComputer, generally referred to as a PC, running a conventionaloperating system 103, such as Windows (a Registered Trade Mark ofMicrosoft Corporation), and having a number of resident applicationprograms 105 such as a word processing program, a network browser ande-mail program or a database management program. The computer 101 alsoincludes an image sequence classification program 109 that enables asignal representing a sequence of images to be classified according towhether the signal represents an animated cartoon. The program 109 hasaccess to a temporary store 104 for storing program variables duringexecution of the program 109. The computer 101 is also connected to aconventional disc storage unit 111 for storing data and programs, akeyboard 113 and mouse 115 for allowing user input and a printer 117 anddisplay unit 119 for providing output from the computer 101. Thecomputer 101 also has access to external networks (not shown) via anetwork card 121.

As shown in FIG. 2, in accordance with a method of the present inventionat step 8 an input signal representing a sequence of images, for examplea sequence of frames of video data, each image comprising a plurality ofpixels, is received. The received signal has components representing avalue in the range 0 to 255 for a red component (R) a blue component (B)and a green component (G) for each of the plurality of pixels whichcomprise each image in the sequence. At step 10 a current image in thesequence is divided into a plurality of subimages, each subimagerepresenting part of the image. In this embodiment of the invention theimage is divided into non-overlapping rectangular subimages ofsubstantially equal size, any difference in size generally being due torounding the required number of pixels to a whole number. The subimagescould equally well be overlapping, of unequal size or of arbitraryshapes. However the subimages should be substantially the same as acorresponding subimage for each image in the sequence.

The red green and blue component values are stored in the temporarystore 104 (FIG. 1) for each subimage of the current image. At step 12the amount by which a pixel value has changed between the current imageand a previously received image in the sequence is calculated. In thisembodiment of the invention the amount of change between correspondingpixels in different images is calculated between a particular image iand the immediately preceding image in the sequence i.e. the image i−1.However the amount of change could equally well be calculated betweenthe current image and any previously received image in the sequence,i.e. the image i–n, in order to determine the rate of change of thevalue of the pixels.

At step 14 the respective rates of change of the pixel values for thesubimage are combined with each other, and with the values calculatedfor other subimages as follows. The rate of change of the value ofpixels is used to determine a percentage of pixels which have asubstantially different value from the corresponding pixel in a previousimage in the sequence.

The percentage of changed pixels is calculated as follows:

For each pixel in the subimage,if |R(x,y)_(t) −R(x,y)_(t−1) |+|G(x,y)_(t) −G(x,y)_(t−1) |+|B(x,y)_(t)−B(x,y)_(t−1 >threshold)then dsub=dsub+1.

Where R(x,y)t is the value of the red component for the pixel atposition (x,y) in the image at time t (the current image in thesequence), R(x,y)_(t−1) is the value of the red component for the pixelat position (x,y) in the image at time t−1 (the previous image in thesequence). Similar notation is used for the green and blue components.dsub is initially set to zero and is used to keep a running total of thenumber of pixels which are substantially different from correspondingpixels in a previous image in the sequence. In order to calculate apercentage dsub is simply divided by the total number of pixels in thesubimage and multiplied by one hundred. threshold is an empirically setvalue which determines whether one pixel is deemed to have a value whichis substantially different from that of another pixel.

It will be appreciated that the rate of change of the pixel value may beobtained from the difference between the pixel value in an image and thepixel value of the same pixel in a subsequent image, without necessarilyinvolving the step of dividing the difference in the pixel value by thetime separation of the two images. This may be the case for example whencomparing or classifying sequences of images which are formed by imagesat regular time intervals, in particular if the time interval betweenthe images in the same for the different sequences.

The percentages of changed pixels for all the subimages are thencombined in order to provide a combined rate of change valuecorresponding to a combined measure of the rate of change of pixelvalues for a particular image. In this embodiment of the invention thepercentages are summed. In other embodiments the combined value fromeach subimage could be weighted, for example, by a weighting valueindicating the importance each subimage which may be calculated asdescribed in our co pending European applications number 00302699.4 or0031262.2

Steps 10 to 14 are repeated for all of the images in the sequence thusproviding a sequence of combined rate of change values, each combinedrate of change value corresponding to a particular image at a particulartime. It is equally possible to calculate rate of change values at timeswhich do not correspond to a particular image in the sequence (forexample using interpolation) and it is not necessary to calculate acombined rate of change value for each and every image in the sequence.It will be appreciated that images at the beginning of the sequence willnot have a rate of change value calculated if there is no correspondingprevious image with which to compare pixels values.

At step 16 scene change filtering is performed. When a scene changes,the value of virtually every pixel in the scene changes significantly.Hence the rate of change value corresponding to a scene change causes asignificant ‘spike’ in the sequence.

An example of this is shown in FIG. 3. In order to remove these spikesany rate of change values above a certain threshold are simply deletedfrom the sequence.

After scene changed filtering at step 16, a Discrete Fourier Transform(DFT) is applied to a series of subsequences of the values at step 18.In this embodiment of the invention the DFT is applied to a subsequencecomprising 50 values in order to provide a spectrum of the frequenciesof the rate of change values. In other embodiments a preprocessingfilter, for example a Hamming window, may be applied to the sequence ofvalues prior to calculation of the DFT in order to remove spurious edgeeffects. FIG. 4 illustrates how the DFT is applied to a subsequencecomprising 50 values which overlaps by 25 values with the nextsubsequence of values to which the DFT is applied.

At step 20 a Discrete Cosine Transform (DCT) is then applied to eachspectrum resulting from the application of the DFT. FIG. 5 shows aspectrum for a cartoon sequence and also for a non-cartoon sequence. Itcan be seen that the spectrum for the rate of change values for thenon-cartoon sequence shows fairly constant frequencies whereas thespectrum for the rate of change values for the cartoon sequence exhibitsmore high frequencies than low frequencies. Eight DCT coefficients areproduced, although more or less could be used as will be discussedlater.

Finally, at step 22 eleven DCT coefficients vectors are used to classifythe sequence as a cartoon or as a non cartoon sequence. Any one of anumber of classifiers could be used. In this embodiment of the inventionthe DCT coefficient vectors are classified using Gaussian Mixture Modelsa description of which may be found in D. A. Reynolds, R. C. Rose and M.J. T. Smith “PC-Based TMS320C30 Implementation of the Gaussian MixtureModel Text-Independent Speaker Recognition System, pages 967-973 ICSPAT,DSP Associates 1992.

FIG. 6 shows how the effectiveness of classification varies independence up the number of DCT coefficients used. The first DCTcoefficient which provides a measure of the energy of the rate of changevalues does not provide any useful information for distinguishingbetween cartoon sequences and non cartoon sequences. The second DCTprovides a very useful measure as the performance improves greatly oncethe second DCT is included in the classification, the performanceimproves up to eight coefficients and then remains much the samethereafter.

As shown in FIG. 7, and referring back for FIG. 2 a classificationprogram 109 according to the invention comprises an image segmenter 70which performs step 10 of FIG. 2, a pixel rate of change calculator 71,which performs step 12. A combiner 81, which performs step 14, comprisesa calculator 72 for determining the percentage of pixels changed in asubimage and an adder 73 for summing the percentages of pixels changedin a plurality of subimages in order to generate a rate of change valuefor each image. Rate of change values for a plurality of images arestored in a buffer 75. A scene change filter 76 performs step 16 andfilters the rate of change values from the buffer 75 in order to removeany spikes in a sequence of the rate of change values caused by a scenechange. A discrete Fourier transformer 77, performing step 18 of FIG. 2,is used to provide a spectrum of a series of rate of change values, andthen a discrete cosine transformer 78, performing step 20 of FIG. 2, isused to parameterise the resulting spectrum into a feature vectorcomprising eight values. The feature vectors are stored in a buffer 79,and finally a classifier 80 is used to classify a sequence of featurevectors as resulting from a cartoon sequence of images or as resultingfrom a non cartoon sequence of images.

In another embodiment of the invention, before the image is divided intosubimages, camera motion is allowed for. In order to do so, a testportion of an image is compared with corresponding test portions ofanother image in a sequence of images. Thus correlation between portionsis determined and camera motion may be estimated.

As will be understood by those skilled in the art, the imageclassification program 109 can be contained on various transmissionand/or storage mediums such as a floppy disc, CD-ROM, or magnetic tapeso that the program can be loaded onto one or more general purposecomputers. The program 109 can also be downloaded over a computernetwork using a suitable transmission medium.

Whilst the invention has been described with reference to a signalrepresenting an image comprising a plurality of pixels, it will beappreciated that the method may equally well be performed on images forwhich the original source of the image does not represent the image as aplurality of pixels.

Unless the context clearly requires otherwise, throughout thedescription and the claims, the words “comprise”, “comprising” and thelike are to be construed in an inclusive as opposed to an exclusive orexhaustive sense; that is to say, in the sense of “including, but notlimited to”.

1. A method for characterising a sequence of images represented by aplurality of pixels whose intensity and/or colour can change with time,each pixel having a pixel value indicative of the intensity and/orcolour of the pixel, the method comprising the steps of: (i) determiningthe temporal rate of change of the pixel value for each pixel of a groupof pixels; (ii) combining the determined rates of change for each of thepixels so as to provide a combined rate of change value for the sequenceof images, the combined rate of change having a plurality of temporalfrequency components associated therewith; (iii) determining the size ofat least some of the temporal frequency components associated with thecombined rate of change; and, (iv) characterising the sequence of imagesin dependence upon the sizes of the frequency components.
 2. A methodaccording to claim 1 further comprising the step of partitioning theimage into a plurality of subimages; and in which the combining step(ii) comprises the sub steps of combining the determined rate of changefor the plurality of pixels in a subimage to provide a subimage rate ofchange; and subsequently combining said subimage rates of change toprovide said value.
 3. A method according to claim 1, in which the rateof change of the value for each pixel is determined by calculating thedifference between the value for a pixel for one image and the value ofa corresponding pixel for a previous image.
 4. A method according toclaim 1, in which the combining step includes the sub step ofdetermining the proportion of pixels in an image of the sequence whichhave a value which is substantially different from the value of thecorresponding pixel in a previous image of the sequence.
 5. A methodaccording to claim 1, a Fourier transform is used in order to obtain thesizes of the frequency components of the combined rate of change.
 6. Amethod according to claim 5, in which a discrete cosine transform isused to provide a plurality of values which characterise the spectrum.7. A method according to claim 1, wherein the sequence of images ischaracterised as either being an animated cartoon or not being ananimated cartoon.
 8. Apparatus for characterising a sequence of imagesrepresented by a plurality of pixels, each pixel having a pixel valueindicative of its intensity and/or colour, the apparatus comprising:means for determining the temporal rate of change of the pixel value foreach pixel of a group of pixels; means for combining the determinedrates of change for each of the pixels so as to provide a combined rateof change value for the sequence of images, the combined rate of changehaving a plurality of temporal frequency components associatedtherewith; means for determining the size of at least some of thetemporal frequency components associated with the combined rate ofchange; and, means for characterising the sequence of images independence upon the sizes of the frequency components.
 9. An apparatusfor determining whether a signal representing a sequence of imagesrepresents an animated cartoon, the apparatus comprising means (71) fordetermining the rate of change of the value for a pixel in an image, thevalue being representative of the intensity and/or colour of the pixel;means (81) for combining the determined rate of change to provide acombined rate of change; means (77) for determining the sizes of thefrequency components of the combined rate of change; and means (80) forclassifying the signal in dependence upon said sizes of the frequencycomponents.
 10. An apparatus according to claim 9 further comprising asegmenter (70) for partitioning an image of the sequence into aplurality of subimages.
 11. An apparatus according to claim 9, in whichthe combiner comprises means (72) for determining the proportion ofpixels in an image which are substantially different from the value ofthe corresponding pixel in a previous image of the sequence.
 12. Anapparatus according to claim 9, further comprising a discrete cosinetransformer (78) for characterising the spectrum.
 13. A data carrierloadable into a computer and carrying instructions for causing thecomputer to carry out the method according to claim
 1. 14. A datacarrier loadable into a computer and carrying instructions for enablingthe computer to provide the apparatus according to claim
 9. 15. A methodfor classifying whether a sequence of images represents a cartoon, inwhich each image comprises a plurality of pixels, each pixel having avalue representative of the intensity and/or colour of the pixel, themethod comprising the steps of for a plurality of pixels in an image,determining the rate of change of the value for each pixel for aplurality of images of a sequence of images; combining the determinedrate of change of the plurality of pixels to provide a combined rate ofchange value for said plurality of images; determining the sizes of thefrequency components of the combined rate of change value; andclassifying the sequence of images in dependence upon said sizes of thefrequency components.